Bin-Hash Indexing: A Parallel Method For Fast Query Processing

نویسندگان

  • Luke J. Gosink
  • Kesheng Wu
  • E. Wes Bethel
  • John D. Owens
  • Kenneth I. Joy
چکیده

This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore wellsuited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not be resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU’s massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Which is Better for kNN Query Processing in the Cloud: Sequential or Parallel

With the development of various Cloud system, providing powerful kNN query capability to DaaS (Database as a Service) is an essential requirement for many applications. In this paper, we are interested in two opposite approaches for processing kNN query in Cloud system, parallel processing and sequential processing, and we want to explore the answer of which one performs better. For addressing ...

متن کامل

PROuST: A Comparison Method of Three-Dimensional Structures of Proteins Using Indexing Techniques

We present a new method for protein structure comparison that combines indexing and dynamic programming (DP). The method is based on simple geometric features of triplets of secondary structures of proteins. These features provide indexes to a hash table that allows fast retrieval of similarity information for a query protein. After the query protein is matched with all proteins in the hash tab...

متن کامل

Using Optimized Multi-Attribute Hash Indexes for Hash Joins

The join operation is one of the most frequently used and expensive query processing operations in relational database systems. One method of joining two relations is to use a hash-based join algorithm. Hash-based join algorithms typically have two phases, a partitioning phase and a partition joining phase. We describe how an optimal multi-attribute hash (MAH) indexing scheme can be used to red...

متن کامل

The Query Web: Reliable Indexing In Dynamic Networks

Peer-To-Peer (P2P) networks have become popular with computer science researchers because of their compelling properties like robustness, scalability and failure resistance. Due to the lack of centralized control, several challenges arise in P2P networks: (1) fast indexing of data structures to allow efficient query processing, (2) creation, abstraction and dissemination of suitable metadata, (...

متن کامل

Analysis of Secondary Structure Elements of Proteins Using Indexing Techniques

In this paper we present a method for protein structure comparison that is based on indexing. Unlike most methods using indexing, ours does not use invariants of theC atoms of the proteins, rather it relies on geometric properties of the secondary structures. Given a set of protein structures, we compute the angles and distances of all the triplets of linear segments associated to the secondary...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008